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A NEW APPROACH IN DATA REDUCTION: PROPER HANDLING OF 
RANDOM ERRORS AND IMAGE DISTORTIONS 
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RESUMEN 

Los procesos de reduction de datos tienen como objetivo minimizar el impacto que las imperfecciones en la 
adquisicion de los mismos producen en la obtencion de medidas de interes para el astronomo. Para conseguir este 
objetivo, es necesario realizar manipulaciones aritmeticas, utilizando imagenes de datos y de calibration. Por 
otro lado, la interpretation correcta de las medidas solo es posible cuando existe una determination precisa de 
los errores asociados. En este trabajo discutimos diferentes estrategias posibles para obtener determinaciones 
realistas de los errores aleatorios finales. En concreto, destacamos los beneficios que conlleva considerar el 
proceso de reduction de datos como la caracterizacion completa de las imagenes originales, pero evitando, tanto 
como sea posible, la alteration aritmetica de las imagenes hasta el momento de su analisis final y obtencion de 
medidas definitivas. Esta filosoffa de reduction sera utilizada en la reduction de datos de ELMER y de EMIR. 

ABSTRACT 

Data reduction procedures are aimed to minimize the impact of data acquisition imperfections on the mea- 
surement of data properties with a scientific meaning for the astronomer. To achieve this purpose, appropriate 
arithmetic manipulations with data and calibration frames must be performed. Furthermore, a full under- 
standing of all the possible measurements relies on a solid constraint of their associated errors. We discuss 
different strategies for obtaining realistic determinations of final random errors. In particular, we highlight 
the benefits of considering the data reduction process as the full characterization of the raw-data frames, but 
avoiding, as far as possible, the arithmetic manipulation of that data until the final measure and analysis of 
the image properties. This philosophy will be used in the pipeline data reduction for ELMER and EMIR. 
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1. INTRODUCTION 

The Gran Telescopio Canarias (GTC)[], as one 
the best human tools to explore and reveal the un- 
known Universe, will give access, in conjunction 
with its pioneering instrumentation, to rather faint 
and/or distant objects, in practice inaccesible for 4 m 
class telescopes. For that reason, very high signal- 
to-noise ratios are expected to be uncommon in most 
cases. Under these circumstances, an accurate error 
estimation is essential to guarantee the reliability of 
the measurements. 

Although there are no magic recipes to quantify 
systematic errors in a general situation, where a case 
by case solution must be sought, the state is, fortu- 
nately, not so bad concerning random errors. Ini- 
tially, the latter can be measured and properly han- 
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died using typical statistical tools. In this contribu- 
tion we discuss the benefits and drawbacks of differ- 
ent methods to quantify random errors in the con- 
text of data reduction pipelines. After examining the 
possibilities, we conclude that the classic reduction 
procedure is not perfectly suited for error handling. 
In this sense, the responsibility for the completion 
of the more complex data reduction steps must be 
transferred to the analysis tools. For this approach 
to be possible, additional information must also be 
provided to those tools, which in turn implies that 
the reduction process should be modified in order to 
produce that information. A discussion concerning 
the treatment of systematic errors is out of the scope 
of this paper. 

2. THE CLASSIC REDUCTION PROCEDURE 
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Fig. 1. Classic reduction procedure. 
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2.1. Three methods to quantify random errors 

In a classic view (see Figure 0), a typical data 
reduction pipeline can be considered as a collection 
of filters, each of which transforms input images into 
new output images, after performing some kind of 
arithmetic manipulation and making use of addi- 
tional measurements and calibration frames when 
required. Under this picture, three different ap- 
proaches can in principle be employed to determine 
random errors in completely reduced images: 

i) Comparison of independent repeated measure- 
ments. This is one of the simplest and most straight- 
forward ways to estimate errors, since, in practice, 
errors are not computed nor handled through the 
reduction procedure. The only requirement is the 
availability of a non too small number of independent 
measurements. Although as such can be considered 
even the flux collected by each independent pixel in a 
detector (for example when determining the sky flux 
error in direct imaging), in most cases this method 
requires the comparison of different frames. For that 
reason, and given that for many purposes it may con- 
stitute an extremely expensive method in terms of 
observing time, its applicability on a general situa- 
tion seems rather unlikely. 

ii) First principles and brute force: error boot- 
strapping. Making use of the knowledge concerning 
how photo-electrons are generated (expected statis- 
tical distribution of photon arrival into each pixel, 
detector gain and read-out noise), it is possible to 
generate an error image associated to each raw-data 
frame. By means of error bootstrapping via Monte 
Carlo simulations, new instances of the initial raw- 
data frame are simulated and can be completely re- 
duced as if they were real observations. The compar- 
ison of the measurements performed over the whole 
set of reduced simulated observations provides then 



a good estimation of the final errors. However, and 
although this method overcome the problem of wast- 
ing observing time, it can also be terribly expensive, 
but now in terms of computing time. 

Hi) First principles and elegance: parallel reduc- 
tion of error and data frames. Instead of wasting 
either observing or computing time, it is also pos- 
sible to feed the data reduction pipeline with both, 
the original raw-data frame and its associated error 
frame (computed from first principles), and proceed 
only once throughout the whole reduction process. 
In this case every single arithmetic manipulation per- 
formed over the data image must be translated, using 
the law of propagation of errors, into parallel manip- 
ulations of the error image. Unfortunately, typical 
astronomical data reduction packages (e.g. Iraf, Mi- 
das, etc.) do not consider random error propagation 
as a default operation and, thus, some kind of addi- 
tional programming is unavoidable. 

2.2. Error correlation — A real problem 

Although each of the three methods described 
above is suitable of being employed in different cir- 
cumstances, the third approach is undoubtedly the 
one that, in practice, can be used in a more general 
situation. In fact, once the appropriate data reduc- 
tion tool is available, the parallel reduction of data 
and error frames is the only way to proceed when ob- 
serving or computing time demands are prohibitively 
high. However, due to the unavoidable fact that the 
information collected by detectors is physically sam- 
pled in pixels, this approach collides with a major 
problem: errors start to be correlated as soon as one 
introduces image manipulations involving rebinning 
or non-integer pixel shifts of data. A naive use of the 
analysis tools would neglect the effect of covariance 
terms, leading to dangerously underestimated final 
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Fig. 2. Modified reduction procedure. 



random errors. Actually, this is likely the most com- 
mon situation since, initially, the classic reduction 
operates as a black box, unless specially modified for 
the contrary. Unfortunately, as soon as one accumu- 
lates a few reduction steps involving increment of 
correlation between adjacent pixels (e.g. image rec- 
tification when correcting for geometric distortions, 
wavelength calibration into a linear scale, etc.), the 
number of covariance terms starts to increase too 
rapidly to make it feasible the possibility of stacking 
up and propagate all the new coefficients for every 
single pixel of an image. 

3. THE MODIFIED REDUCTION PROCEDURE 

3.1. Image Characterization 

Obviously, the problem can be circumvented if 
one prevents its emergence, i.e. if one does not allow 
the data reduction process to introduce correlation 
into neighboring pixels before the final analysis. In 
other words, if all the reduction steps that lead to 
error correlation are performed in a single step dur- 
ing the measurement of the image properties with 
a scientific meaning for the astronomer, there are 
no previous covariance terms to be concerned with. 
Whether this is actually possible or not may depend 
on the type of reduction steps under consideration. 
In any case, a change in the philosophy of the classic 
reduction procedure can greatly help in alleviating 



the problem. The core of this change consists in 
considering the reductions steps that originate pixel 
correlation as filters that do not necessarily take in- 
put images and generate new versions of them after 
applying some kind of arithmetic manipulation, but 
as filters that properly characterize the image prop- 
erties, without modifying those input images. 

More precisely, the reduction steps can be segre- 
gated in two groups (see Figure ||): a) simple steps, 
which do not require data rebinning nor non-integer 
pixel shifts of data; and b) complex steps, those suit- 
able of introducing error correlation between adja- 
cent pixels. The former may be operated like in 
a classic reductions, since their application do not 
introduce covariance terms. However, the complex 
steps are only allowed to determine the required im- 
age properties that one would need to actually per- 
form the correction. For the more common situa- 
tions, this characterizations may be simple polyno- 
mials (in order to model geometric distortions, non- 
linear wavelength calibration scales, differential re- 
fraction dependence with wavelength, etc.). Under 
this view, the end product of the modified reduc- 
tion procedure is constituted by a slightly modified 
version of the raw data frames (after quite simple 
arithmetic manipulations) and by an associated col- 
lection of image characterizations. 
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Fig. 3. Comparison between classic (upper panel) and modified (lower panel) reduction procedures. 



3.2. Modus Operandi 
Clearly, at any moment it is possible to com- 
bine the result of the partial reduction after all the 
linkable simple steps, with the information achieved 
through all the characterizations derived from the 
complex steps, to obtain the same result than in a 
classic data reduction (thick line in Fig. ||). How- 
ever, instead of trying to obtain completely reduced 
images ready for starting the analysis work, one can 
directly feed a clever analysis tool with the end prod- 
ucts of the modified reduction procedure (see Fig- 
ure ||). Obviously, this clever analysis tool has to 
perform its task taking into account that some reduc- 
tions steps have not been performed. For instance, 
if one considers the study of a 2D spectroscopic im- 
age, the analysis tool should use the information 
concerning geometric distortions, wavelength cali- 
bration scale, differential refraction, etc., to obtain, 
for example, an equivalent width through the mea- 
surement in the partially reduced (uncorrected for 
geometric distortions, wavelength calibration, etc.) 
image. 



To accomplish this task, it is necessary to manipu- 
late the data using a new and distorted system of 
coordinates that must override the orthogonal coor- 
dinate system defined by the physical pixels. It is in 
this step where the final error of the equivalent width 
should be obtained. It is important to highlight that, 
in this situation, such error estimation should not be 
a complex task, since the analysis tool is supposed 
to be handling uncorrelated pixels. 

The described reduction philosophy will be incor- 
porated into the pipeline data reduction for ELMER 



(http://www.gtc.iac.es/instrumcntation/clmcrj3.asp) 
and EMIR (http://www.ucm.es/info/cmir). 



Financial support for this research has been pro- 
vided by the Spanish Programa Nacional de As- 
tronomfa y Astroffsica under grants AYA2000-977 
and AYA2000-1790. This work has been benefitted 
by the experience of reducing NIRSPEC data ob- 
tained at Keck II. In this sense, N.C. acknowledges 
partial financial support from a UCM Fundacion del 
Amo Fellowship, and a short contract at UCSC. 



N. Cardiel, J. Gorgas, J. Gallego, A. Serrano, and J. Zamorano: Departamento de Astroffsica, Facultad de 
Ffsicas, Universidad Complutense de Madrid, 28040 Madrid, Spain (ncl@astrax.fis.ucm.es). 

M. L. Garcia- Vargas, P. Gomez-Cambronero, and J. M. Filgueira: Gran Telescopio Canarias (GTC), C/ Via 
Lactea s/n (Instituto de Astroffsica de Canarias), 38200 La Laguna (Tenerife), Spain. 



